Live R Coding Session

Author
Affiliation

Jeremy Springman

University of Pennsylvania

Published

July 23, 2024

Introducing the RStudio Layout

Before using R to illustrate basic programming concepts and data analysis tools, we will get familiar with the RStudio layout.

Rstudio contains 4 panes

RStudio has four primary panels that will help you interact with your data. We will use the default layout of these panels.

  • Source panel: Top left
    • Edit files to create ‘scripts’ of code
  • Console panel: Bottom left
    • Accepts code as input
    • Displays output when we run code
  • Environment panel: Top right
    • Everything that R is holding in memory
    • Objects that you create in the console or source panels will appear here
    • You can clear the environment with the broom icon
  • Viewer panel: Bottom-right
    • View graphics that you generate
    • Navigate files

Illustration

Let’s use these panels to create and interact with data.

Console:

  • Perform a calculation: type 2 + 2 into the console panel and hit ENTER
  • Create and store an object: type sum = 2 + 2 into the console panel and hit ENTER

Source:

  • Start an R script: Open new .R file (button in top-left below “File”)
  • Create and store an object: type sum = 2 + 3 into the source panel and hit cntrl+ENTER

Environment:

  • Confirm that the object sum is stored in our environment
  • Use rm(sum) to clear the object from the environment
  • Clear the environment with the broom icon

Viewer:

  • Navigate through your computer’s files
  • Create a plot in the source panel

Review of Basic Programming Concepts

Now that we understand the layout, we are ready to review the concepts covered in Module 2 Week 2.2. These concepts will help us understand what is happening when we create and manipulate data.

Objects: where values are saved in R

“Object” is a generic term for anything that R stores in the environment. This can include anything from an individual number or word, to lists of values, to entire datasets.

Importantly, objects belong to different “classes” depending on the type of values that they store.

# Create a numeric object
my_number = 5.6
# Check the class
class(my_number)
[1] "numeric"
# Perform a calculation
my_number + 5
[1] 10.6

The class of an object determines the type of operations you can perform on it. Some operations can only be run on numeric objects (numbers).

# Create a character object
my_number = "5.6"
# Check the class
class(my_number)
# Perform a calculation
my_number + 5
round(my_number)

R contains functions that can convert some objects to different factors.

# Convert character to numeric
my_number = as.numeric("5")
class(my_number)
[1] "numeric"
# R is only so smart
my_number = as.numeric("five")
class(my_number)
[1] "numeric"

Lists

  • Store a series of values
  • Create a list using c()
  • Perform calculations
# Create a numeric vector
numeric_vector = c(6, 11, 13, 31)
# Print the vector
print(numeric_vector)
[1]  6 11 13 31
# Check the class
class(numeric_vector)
[1] "numeric"
# Calculate the mean
mean(numeric_vector)
[1] 15.25
character_vector = c("6", "11", "13", "31")

class(character_vector)
[1] "character"
mean(character_vector)
[1] NA
  • Characters are text or strings like "hello world" and "welcome to R"
  • Factors are a group of characters/strings with a fixed number of unique values
  • Logicals are either TRUE or FALSE
  • Data frames are objects where the rows correspond to observations and the columns correspond to variables that describe the observations
## Numeric
sum = 2+2

print(3/2)
[1] 1.5
# Characters
print("hello world")
[1] "hello world"
## Logical
print(2 > 3)
[1] FALSE
# Factor

Basic programming concepts

  • Conditionals:
    • Testing for equality in R is done using ==. For example, 2 + 1 == 3 will return TRUE
    • Boolean algebra: Operators such as < (less than), <= (less than or equal), and !=(not equal to). For example, 3 + 5 <= 1 will return FALSE
    • Logical operators: & represent “and” while | represents “or.” For example, (2 + 1 == 3) & (2 + 1 == 4) returns FALSE since both clauses are not TRUE

Loading Packages

  • R gives you accesss to thousands of “packages” that are created by users
  • Packages contain datasets and bundles of code called “functions” that can execute specific tasks
  • Use install.packages() to install a package
    • Insert the name of the package contained in quotation marks
    • Start by installing the dplyr package

Loading Data into R

  • Load data into your environment by “reading-in” a spreadsheet
  • Spreadsheets should be saved as a .csv file
  • Use read.csv() to pull data from a spreadsheet on your harddrive into your R/RStudio environment
    • Within the parentheses, add the full file pathway where the .csv file is stored

Cleaning Data

Index Variables

Additive Scale

Averaged Z-Scores

Regression

Differences across groups

Binary


  Year I  Year II Year III 
     327      277      221 

  0   1 
327 221 
Bivariate Multivariate  Interaction
(Intercept) −0.150*** (0.036) −0.142** (0.051) −0.184** (0.061)
moved 0.234*** (0.045) 0.273*** (0.055) 0.334*** (0.073)
year −0.046 (0.055) 0.038 (0.086)
moved × year −0.142 (0.112)
Num.Obs. 809 534 534
R2 Adj. 0.032 0.045 0.046

Continuous-ish


  0   1   2 
327 277 221 
Bivariate Multivariate  Interaction
(Intercept) −0.150*** (0.036) −0.121** (0.045) −0.169** (0.056)
moved 0.234*** (0.045) 0.228*** (0.045) 0.301*** (0.067)
year −0.029 (0.027) 0.019 (0.042)
moved × year −0.080 (0.054)
Num.Obs. 809 809 809
R2 Adj. 0.032 0.032 0.033

More complex


  0   1   2 
327 277 221 
Bivariate Multivariate  Interaction
(Intercept) −0.150*** (0.036) −0.114* (0.046) −0.184** (0.060)
moved 0.234*** (0.045) 0.232*** (0.045) 0.334*** (0.072)
year2 −0.060 (0.049) 0.077 (0.092)
year3 −0.053 (0.054) 0.038 (0.085)
moved × year2 −0.194+ (0.109)
moved × year3 −0.142 (0.110)
Num.Obs. 809 809 809
R2 Adj. 0.032 0.031 0.033

Differences over time

Cross-Sectional

Bivariate Multivariate  Interaction
(Intercept) −0.150*** (0.026) −0.148*** (0.030) −0.150*** (0.037)
moved 0.232*** (0.032) 0.232*** (0.032) 0.234*** (0.046)
time −0.004 (0.031) −0.001 (0.052)
moved × time −0.004 (0.065)
Num.Obs. 1634 1634 1634
R2 Adj. 0.030 0.030 0.029

Fixed Effects

Simple Fixed Effects  Interaction
(Intercept) 0.005 (0.022) −0.330 (0.332) −0.374 (0.333)
time −0.005 (0.031) −0.002 (0.023) 0.000 (0.040)
moved 0.044 (0.470)
moved × time −0.004 (0.049)
Num.Obs. 1634 1634 1634
R2 Adj. −0.001 0.442 0.442

Appendix

Some operations can only be run on character (string) objects.

Strings

# Create a character object
my_number = "5.6"
# Check the class
class(my_number)
# Perform a calculation
stringr::str_split(my_number, "\\.")
# Create a character object
my_number = 5.6
# Check the class
class(my_number)
# Perform a calculation
stringr::str_split(my_number, "\\.")
stringr::str_length(my_number)